In this tutorial we will go through some of the possibilities offered by ‘ggtree’ to visualize phylogenetic trees, format them and add additional information to the plot.
‘ggtree’ is an extension of ggplot2 and, as such, it offers many possibilities to format, manipulate and map data to phylogenetic trees.
Additional information available in: https://bioconductor.org/packages/release/bioc/html/ggtree.html
Additional tutorials and examples:
https://yulab-smu.top/treedata-book/index.html
https://xiayh17.gitee.io/treedata-book/index.html
https://github.com/GuangchuangYu/ggtree-current-protocols
https://github.com/fionarhuang/TreeHeatmap
References:
G Yu. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics, 2020, 69:e96. doi: 10.1002/cpbi.96.
LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu*. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Molecular Biology and Evolution. 2020, 37(2):599-603. doi: 10.1093/molbev/msz240.
G Yu, TTY Lam, H Zhu, Y Guan. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution. 2018, 35(2):3041-3043. doi: 10.1093/molbev/msy194.
G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017, 8(1):28-36. doi: 10.1111/2041-210X.12628.
In order to install ‘ggtree’, you need BiocManager. Additionally, we will also use ‘ape’ (Analyses of Phylogenetics and Evolution).
#if (!require("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#if (!require("ggtree", quietly = TRUE))
# BiocManager::install("ggtree")
library(ggtree)
library(ggtreeExtra)
library(ape)
library(ggplot2)
library(ggtext)
library(ggforce)
library(reshape2)
library(ggimage)
library(grid)
library(tidytree)
library(phylobase)
library(ggstance)
library(ggjoy)
library(dplyr)
In my working directory, I have a folder for phylogenetic trees (“trees”), data (“data”), output figures (“figures”) and some microscopic photos (“photos”).
setwd("F:/Home_office_2020-03-16/Trichosporonales_project/Courses/ggtree_workshop/")
load("workshop_ggtree_session.RData")
We start by importing a phylogenetic tree, previously calculated using RAxML. I will call it “tree”. Additionally, we will also import some data we have available for our dataset (“species_info”). This information can be numeric (e.g. size, temperature, etc) or categorical (e.g. genus name, host plant, accession number, etc). We will combine and visualize this information later, together with the phylogeny.
Important: any data imported (e.g. species_info) must be converted into a data frame. Otherwise, the mapping will not work.
tree = read.tree("trees/RAxML.tree")
species_info = read.delim("data/species_info.txt")
species_info = as.data.frame(species_info)
The phylogeny and data will be combined based on taxa IDs. So, make sure the IDs are the same in all datasets. The species order in the file, does not need to be the same.
Taxa included in the phylogeny and in the dataset do not need to be the same. Only the species present both in the tree and species_info will be considered. Likewise, species missing in one of the datasets will be ignored.
This means, if you have information for several species which are not included in the phylogeny, these data will be ignored (you do not need to create a subset of your data). If you have species in the phylogeny, for which you do not have information, these species will be present in the phylogeny, but no data will be added.
This is a tab delimited table. For each species, I have the ID (same as in the phylogeny), the genus and species, the strain ID (may be the same or different from the first ID), the “host” from which they were collected, the “coolness” of our species and an accession number (e.g. for ITS).
head(species_info)
## ID Genus species Genus_species strain host coolness AccNumb
## 1 A1 Species red Species red A1 Plant A cool LR881294
## 2 A2 Species red Species red A2 Plant B cool LR881297
## 3 A3 Species red Species red A3 Plant B cool LR881298
## 4 B Species blue Species blue B Plant C cool LR881299
## 5 C Species green Species green C Plant C not cool LR881302
## 6 D Species black Species black D Plant C not cool LR881303
We can manipulate the data to include or combine different variables. For example, we can automatically abbreviate the genus name (while avoiding having different genera with the same abbreviation) and further combine it with the species name. This is useful to avoid having long and repetitive names and to edit the names manually, which sometimes can introduce errors (e.g. same abbreviation for 2 different names).
species_info$Gen = abbreviate(species_info$Genus, minlength = 2, dot = T, method = "left.kept")
species_info$Gen_species = paste(species_info$Gen, species_info$species)
head(species_info)
## ID Genus species Genus_species strain host coolness AccNumb Gen
## 1 A1 Species red Species red A1 Plant A cool LR881294 Sp.
## 2 A2 Species red Species red A2 Plant B cool LR881297 Sp.
## 3 A3 Species red Species red A3 Plant B cool LR881298 Sp.
## 4 B Species blue Species blue B Plant C cool LR881299 Sp.
## 5 C Species green Species green C Plant C not cool LR881302 Sp.
## 6 D Species black Species black D Plant C not cool LR881303 Sp.
## Gen_species
## 1 Sp. red
## 2 Sp. red
## 3 Sp. red
## 4 Sp. blue
## 5 Sp. green
## 6 Sp. black
The default tree is just the skeleton and does not include any additional information. Everything in addition will need to be requested.
ggtree(tree)
You have many different possible layouts and styles. These include rectangular trees, rounded corners, circular, dendograms, networks, etc. You may also rotate and change angles and whether to consider branch lengths or not. All options are available in the description of the ggtree.
ggtree(tree)+
ggtree(tree, layout="roundrect")+
ggtree(tree, layout="slanted")+
ggtree(tree, layout="ellipse")+
ggtree(tree, layout="circular")+
ggtree(tree, layout="fan", open.angle=120)+
ggtree(tree, layout="equal_angle")+
ggtree(tree, layout="daylight")+
ggtree(tree, branch.length='none')+
ggtree(tree, layout="ellipse", branch.length="none")+
ggtree(tree, branch.length='none', layout='circular')+
ggtree(tree, layout="daylight", branch.length = 'none')
The default tree looks like this:
ggtree(tree)
We will start by increase the thickness of the branches
ggtree(tree, size = 1.5)
We may also change the type of line.
ggtree(tree, size = 1.5, linetype="dotted")
And choose a specific colour for the lines.
ggtree(tree, size = 1.5, colour="red", linetype="dotted")
We can also colour the branches according to data. In this case, we will colour according to “branch length”. You may also choose more useful information, such as mutation rates, number of positively selected genes, molecular clocks, etc.
ggtree(tree, branch.length = "branch.length", size = 1.5, aes(color=branch.length)) +
scale_color_continuous(low='dodgerblue2', high='red2') +
theme(legend.position="right")+
labs(color='Branch length')
We may also colour according to groups (e.g. genus). For this, we define a group of clades, based on node ID (we will learn later how to find out the node IDs) and colour branches by group.
groups <- groupClade(tree, .node=c(51, 36))
ggtree(groups, aes(color=group), size = 1.5)+
scale_color_discrete(name = "Genus", labels = c("Genus A", "Genus B", "Genus C"))
Similar to ggplot, we can save the formatted tree as a variable in R, so we can later easily visualize it or add things to ‘p’ without having to run the code again. We will save the following tree as ‘p’:
p = ggtree(tree, size = 1)
p
The default plot does not include a scale. This is important to add, to give sense to the branch lengths. Since it is not drawn by default, we need to specify it.
p + geom_treescale()
We can adjust the position of the scale, based on x/y coordinates. The tree will adjust itself to the new position of the scale.
p + geom_treescale(x=0.5, y=0)
p + geom_treescale(x=0, y=30)
p + geom_treescale(x=0.8, y=30)
p + geom_treescale(x=0.8, y=15)
p + geom_treescale(x=0.8, y=0)
p + geom_treescale(x=0, y=-1)
We can also format the font size, thickness of the line, colours of the text and line, etc.
p + geom_treescale(x=0, y=-1, fontsize = 3)
p + geom_treescale(x=0, y=-1, fontsize = 3, linesize=1)
Once you find the right format, save it as ‘p’.
p=p + geom_treescale(x=0, y=-1, fontsize = 3, linesize=1)
p
By default, the names of the taxa are also not included. This is another vital information we need to add.
p + geom_tiplab()
In some cases, it might be useful to align the species names on the right. The dots leading to the names are automatic.
p + geom_tiplab(align = TRUE)
For the remaining tutorial, I will continue using the names unaligned and next to the branch.
p + geom_tiplab()
currently, we have the default IDs of the phylogenetic tree. We can replace these by any information we have available on the species_info dataset. We will add now the genus and species names.
This is done by providing the tree plot (p) and combine it with the dataset we want to map (p %<+% species_info). Note the special symbol (%<+%). This symbol adds annotation data to a tree.
p %<+% species_info +
geom_tiplab(aes(label = factor(Genus_species)))
The names are too long for the plot. Since we have very long branches, We can comprise the tree to make it more compact, while preserving the branch length information and adjusting the scale, accordingly.
p %<+% species_info +
geom_tiplab(aes(label = factor(Genus_species))) +
xlim(0,1)
Usually, species names should be in italic. We can easily format the names as well.
p %<+% species_info +
geom_tiplab(aes(label = factor(Genus_species), fontface = "italic")) +
xlim(0,1)
We may also change the colour and size, to our preference and needs.
p %<+% species_info +
geom_tiplab(aes(label = factor(Genus_species), fontface = "italic"), colour = "blue", size = 5) +
xlim(0,1)
Or we can colour the names according to relevant information. In this case, we will colour based on genus names (present in species_info dataset)
p %<+% species_info +
geom_tiplab(aes(label = factor(Genus_species), fontface = "italic", colour = Genus), size = 5) +
xlim(0,1)
We may also combine different information on the labels. For example, genus name, species name and strain ID. Each can be formatted differently.
p %<+% species_info +
geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
xlim(0,1)
p %<+% species_info +
geom_tiplab(aes(label=paste0('italic(', Gen, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
xlim(0,1)
We may also combine information with custom text. In this case, we print the accession numbers between parentheses.
p %<+% species_info +
geom_tiplab(aes(label=paste0('italic(', Gen, ')~italic(', species, ')~bold(', strain, ')~plain(', "(", AccNumb, ")",')')), parse=T)+
xlim(0,1)
Or we can, for example, also add the hosts where the strains were isolated from. In this case, we combine species information, custom text (“in host”) and individual formatting for each (plain, bold and italic). We might need to adjust the space between factors with the ‘offset’ parameter.
p %<+% species_info +
geom_tiplab() +
geom_tiplab(aes(label = "in host", fontface = "bold"), offset = 0.04) +
geom_tiplab(aes(label = factor(host), fontface = "italic"), offset=0.14)+
xlim(0,1)
# Full genus name
p2 = p %<+% species_info +
geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
xlim(0,1)
# Abbreviated genus name
p3 = p %<+% species_info +
geom_tiplab(aes(label=paste0('italic(', Gen, ')~italic(', species, ')~bold(', strain, ')~plain(', "(", AccNumb, ")",')')), parse=T)+
xlim(0,1)
Another important missing information, is the node support or bootstrap.
p2
The default placement and formatting might not be the best:
p2 + geom_nodelab()
We have many options to format the text, but we’re mainly interested in changing its font size
p2 + geom_nodelab(size=2.5)
and it’s relative position to the nodes.
p2 + geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)
Now we save the tree as ‘p4’
p4 = p2 + geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)
Each node has a unique ID number. We can address these node numbers to apply additional formatting (e.g. colours) to clades deriving from this specific node.
ggtree(tree) %<+% species_info +
geom_text(aes(label=node), hjust=-.3, size=3)
Now we will format some clades.
p4
We can flip/rotate branches and clades. In this case, we specify that we flip (change the order) node 36 and node 51.
flip(tree_view = p4, node1 = 36, node2 = 51)
By using node numbers, we can also specify new roots to the tree. We now selected node 51 as the new tree.
In this case, we need to recompute the previous steps. We cannot flip an existing phylo object (i.e. a tree saved as a plot). We need to reroot the tree we read in the beginning (tree).
We can select the new root with root() and by specifying the node number. In addition, we draw already the new tree (by including all commands within ggtree().
rerooted_tree = ggtree(root(tree, node = 51, edgelabel = TRUE), size=1) +
geom_treescale(x=0, y=-1, fontsize = 3, linesize=1)
Now, we can remap the data to this rerooted tree, as before.
rerooted_tree %<+% species_info +
geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)+
xlim(0,1)
We have several strains for the same species (e.g. A1, A2, A3).
p4
We may remove taxa from the phylogeny. The branch lengths will be recalculated to reflect this, so the tips have the same total length, from the last common node.
We can specify which taxa we want to drop, by listing their IDs. In this case we will exclude multiple strains for the same species.
Then, we will plot the resulting tree and format it, as before.
edited_tree = drop.tip(tree, c("N2", "N3", "R2", "U2", "U3", "A2", "A3", "P2", "I2"))
edited_tree_p = ggtree(edited_tree, size = 1) +
geom_treescale(x=0, y=-1, fontsize = 3, linesize=1)
edited_tree_p %<+% species_info +
geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)+
xlim(0,1)
We can save and export the edited tree. This will be in ‘newick’ format, just like our original output from RAxML.
write.tree(edited_tree, "trees/my_edited.tree")
We might need to annotate some clades.
p2 + geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)
We will delimit a clade, based on node number, and add a label to it. The labels may be formatted independentely.
p5=p2 + geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)+xlim(0,1.3)+
geom_cladelabel(node = 51, label = "Species clade", barsize = 2, align = F, offset = 0.3, offset.text = 0.02, horizontal = F, angle = -90, hjust=0.5)+
geom_cladelabel(node = 36, label = "Something clade", barsize = 2, align = F, offset = 0.35, offset.text = 0.01, horizontal = TRUE)
p5
Additionally, we might want to collapse monophyletic clades which we might not need to fully represent. This is accomplished by node number, as well. Each collapsed group, can have different filling colours. “Fill” addresses the filling colour, and “colour” the contour of the shape. Additional formatting parameters might be added (e.g. size, alpha, etc)
p5 %>% collapse(node = 61, 'max', fill = "black") %>%
collapse(node = 55, 'max', fill = "grey", colour = "black")
We can label these collapsed clades, as well and format them accordingly.
p5 %>% collapse(node = 61, 'max', fill = "black") %>%
collapse(node = 55, 'max', fill = "grey", colour = "black") +
geom_cladelabel(node = 61, label = "Collapsed group 1", barsize = 0, align = F, offset = 0.13, horizontal = T, fontface = "bold", colour = "black")+
geom_cladelabel(node = 55, label = "Collapsed group 2", barsize = 0, align = F, offset = 0.23, horizontal = T, fontface = "bold", colour = "azure4")
Specific clades can be extracted and formatted from the original tree. This is based on node number, as well.
clade_57 = viewClade(ggtree(tree, color="red", linetype="dotted", size=2), node = 57)+
geom_tiplab()+geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)+
ggtitle("subset of the Species clade")
clade_57
clade_37 = viewClade(ggtree(tree, color="blue", size = 2), node = 37)+
geom_tiplab()+geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)+
ggtitle("subset of the Something clade")
clade_37
We may plot different trees or clades together in the same figure. This can be done by different packages.
p6=p4 + geom_highlight(node = 57, alpha = 0.2, fill="red") +
geom_highlight(node = 37, alpha = 0.2, fill="blue") +
xlim(0,1.5) +
ggtitle("Best phylogeny ever")
p6 + clade_57 / clade_37
When we are looking at specific groups and taxa, it might be useful to add some images to them.
clade_37
Sometimes, we might want to add macro or microscopic photos, a cartoon of the host, etc. These are mapped to the taxa ID, based on the image file name. The folder should contain images for every taxa (otherwise it will not work) and each file, should have the same “label” as the taxa. The size, scale and transparency of the figures can be formatted.
viewClade(ggtree(tree, color="blue", size = 2), node = 37, xmax_adjust = 0.3)+
geom_nodelab(size=2.5, hjust = 1.5, vjust=-0.5)+
geom_tiplab()+
geom_tiplab(aes(image=paste0('photos/', label, '.jpg')),
geom="image", offset=0.05, size=0.15, alpha = 1)
Sometimes, adding information to the tips can be useful.
ggtree(tree, size = 1) %<+% species_info +
geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
xlim(0,1)
This can include colouring the species names based on some variable information, such as coolness, origin, etc.
ggtree(tree, size = 1) %<+% species_info +
geom_tiplab(aes(color=coolness, label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T)+
xlim(0,1)
Or adding a shape at the tip, and adding a colour code instead.
ggtree(tree, size = 1) %<+% species_info +
geom_tippoint(aes(color=coolness), size=3,position = position_nudge(x = 0.01))+
geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T, hjust = -0.1)+
xlim(0,1)
Additionally, we can combine shapes and colour to combine different variables, like host and coolness.
ggtree(tree, size = 1) %<+% species_info +
geom_tippoint(aes(shape=host, color=coolness), size=3,position = position_nudge(x = 0.01))+
geom_tiplab(aes(label=paste0('italic(', Genus, ')~italic(', species, ')~bold(', strain, ')')), parse=T, hjust = -0.1)+
xlim(0,1)
We will now save and use the following tree and add additional information to it.
p7 = ggtree(tree, size=1) %<+% species_info +
geom_tippoint(aes(color=coolness), size=3, position = position_nudge(x = 0.01))+
geom_tiplab(hjust = -1)
p7
We will now import several independent tables and use them to add information in different panels to our phylogeny.
Any imported table MUST be specified as.data.frame. Otherwise, mapping will not be accurate and no warning will be produced.
We will import general genome statistics, number of genes, SNP location, genotypic information and results from STRUCTURE analyses (two different K values).
genome_stats = read.delim("data/genome_data.txt")
genome_stats = as.data.frame(genome_stats)
genome_stats2 = read.delim("data/genome_data2.txt")
genome_stats2 = as.data.frame(genome_stats2)
genes = read.delim("data/genes.txt")
genes = as.data.frame(genes)
SNP = read.delim("data/SNP.txt")
SNP = as.data.frame(SNP)
genotype = read.delim("data/genotype.txt", header = 1, row.names = 1)
genotype = as.data.frame(genotype)
structurek4 = read.delim("data/structure_K4.txt")
structurek4 = as.data.frame(structurek4)
structurek5 = read.delim("data/structure_K5.txt")
structurek5 = as.data.frame(structurek5)
A nice colour scheme is always useful.
colours25 <- c("dodgerblue2", "#E31A1C", "green4", "#6A3D9A", "#FF7F00", "black", "gold1", "skyblue2", "#FB9A99", "palegreen2", "#CAB2D6", "#FDBF6F", "gray70", "khaki2", "maroon", "orchid1", "deeppink1", "blue1", "steelblue4", "darkturquoise", "green1", "yellow4", "yellow3", "darkorange4", "brown")
First, we will add a heatmap, with the genotype information.
gheatmap(p7, genotype, width=0.6,
colnames=TRUE, font.size=3,
colnames_angle=-45, hjust=0, offset = 0.1) +
scale_x_ggtree() +
scale_y_continuous(expand=c(0, 1))
To add structure information, we will need to format the matrix, into a 3-column table. This can achieved with ‘melt’ from reshape2 package. We will also add column names to the resulting table.
melt_structurek4 = melt(structurek4, id.vars = "Ind")
colnames(melt_structurek4) = c("id", "population", "value")
melt_structurek5 = melt(structurek5, id.vars = "Ind")
colnames(melt_structurek5) = c("id", "population", "value")
pstruct = p7 + geom_facet(panel ='K4',
data = melt_structurek4,
geom = geom_bar,
stat = "identity",
position = position_stack(reverse = TRUE),
aes(x = value, fill=population),
orientation = 'y',
width = .6) +
scale_fill_manual(values=colours25)+
theme_tree2()
pstruct
pstruct2 = pstruct + geom_facet(panel ='K5',
data = melt_structurek5,
geom = geom_bar,
stat = "identity",
position = position_stack(reverse = TRUE),
aes(x = value, fill=population),
orientation = 'y',
width = .6) +
scale_fill_manual(values=colours25)+
theme_tree2()+
theme(legend.position = "")
pstruct2
dev.off()
## null device
## 1
print(pstruct2, vp=viewport(angle=90))
pdf("figures/structure.pdf")
print(pstruct2, vp=viewport(angle=90))
dev.off()
## png
## 2
p7
p7 + geom_facet(panel ='Size',
data = genome_stats2,
geom = geom_boxploth,
aes(x = Chr_size/1000000, group=label, fill=coolness)) +
theme_tree2()
p7 + geom_facet(panel ='Distribution',
data = genome_stats2,
geom = geom_joy,
aes(x = Chr_size, group=label, fill=coolness),
color='grey80', lwd=.3) +
theme_tree2()
p7 + geom_facet(panel ='Genome size (Mbp)',
data = genome_stats,
geom = geom_point,
aes(x=0, size = Genome_size/1000000),
fill="black",
orientation = 'y',
width = .6) +
theme_tree2()
p7 + geom_facet(panel ='Genome size (Mbp)',
data = genome_stats,
geom = geom_point,
aes(x=0, size = Genome_size/1000000),
fill="black",
orientation = 'y',
width = .6)+
xlim(0,0.8)
p7 + geom_facet(panel ='Genome size (Mbp)',
data = genome_stats2,
geom = geom_bar,
stat = "identity",
position = position_stack(reverse = TRUE),
aes(x = Chr_size/1000000, fill=Chr),
orientation = 'y',
width = .6) +
scale_fill_manual(values=colours25)+
theme_tree2()
p8 = p7 + geom_facet(panel ='Genome size (Mbp)',
data = genome_stats,
geom = geom_col,
aes(x = Genome_size/1000000),
fill="black",
orientation = 'y',
width = .6) +
theme_tree2()
p8
p9 = p8 + geom_facet(panel ='Proteome',
data = genes,
geom = geom_bar,
stat = "identity",
position = 'stack',
aes(x = genes, fill=SP),
colour = "black",
orientation = 'y',
width = .6) +
theme_tree2()+
theme(legend.position = "right",
axis.text = element_text(colour = "black"),
axis.ticks = element_line(colour = "black"))+
scale_fill_manual(values=c("Non-secreted proteins" = "darkgrey","Secreted proteins" = "white"))
p9
p10 = p9 + geom_facet(panel = "SNP",
data = SNP,
geom = geom_point,
mapping=aes(x = pos, color=coolness),
shape = '|',
size=5)+
theme(legend.position = "bottom")+
xlim_tree(1)
p10
p11=facet_widths(p10, c(Tree = 1, "Genome size (Mbp)" = 0.2, Proteome = 0.5, SNP = 2))
p11
pdf("figures/MyStudy.pdf", width = 13, height = 8)
p11
dev.off()
## png
## 2
tiff("figures/MyStudy.tiff", width = 13, height = 8, res = 100, units = "in")
p11
dev.off()
## png
## 2
pc = ggtree(tree, layout = 'circular', size=1)
pc
pc2 = pc %<+% species_info +
geom_tippoint(aes(color=coolness), size=3,position = position_nudge(x = 0.01))+
geom_tiplab(offset = 0.03)
pc2
pc3 = pc2 +
geom_fruit(
data=genome_stats,
geom=geom_col,
mapping = aes(x=Genome_size, y=ID),
fill="black",
orientation = 'y',
offset = 0.2)
pc3
pc4 = pc3 +
geom_fruit(
data=genome_stats2,
geom=geom_col,
mapping = aes(x=Chr_size, y=ID, fill=Chr),
orientation = 'y')+
scale_fill_manual(values=colours25)
pc4
pc5 = pc3 +
geom_fruit(
data=genes,
geom = geom_bar,
mapping = aes(x = genes, y=id, fill=SP),
colour = "black",
orientation = 'y',
stat="identity")+
scale_fill_manual(values=c("Non-secreted proteins" = "darkgrey","Secreted proteins" = "black"))
pc5
rotate_tree(pc5, 265)
rotate_tree(pc5, 265) + geom_highlight(node = 57, alpha = 0.3, fill="red") +
geom_highlight(node = 37, alpha = 0.3, fill="blue")
pc6 = rotate_tree(pc2, 265)
gheatmap(pc6, genotype, offset=0.12, width=0.6, colnames_offset_y = 0, font.size=3)
gheatmap(pc6, genotype, offset=0.12, width=0.6, colnames_offset_y = 0, font.size=3)+
gheatmap(p7, genotype, width=0.6,
colnames=TRUE, font.size=3,
colnames_angle=-45, hjust=0, offset = 0.1) +
scale_x_ggtree() +
scale_y_continuous(expand=c(0, 1))
tree1 = read.tree("trees/RAxML.tree")
tree2 = read.tree("trees/RAxML_gene2.tree")
p1 <- ggtree(tree1)
p2 <- ggtree(tree2)
p1 + geom_tiplab() + annotate("text", x=0.1, y=33, label = "Gene 1", fontface = "bold") +
p2 + geom_tiplab() + annotate("text", x=0.2, y=33, label = "Gene 2", fontface = "bold")
d1 <- p1$data
d2 <- p2$data
## reverse x-axis and
## set offset to make the tree on the right-hand side of the first tree
d2$x <- max(d2$x) - d2$x + max(d1$x) + 1
pp <- p1 + geom_tree(data=d2) +
ggnewscale::new_scale_fill() +
geom_hilight(
data = d2,
mapping = aes(
subset = node %in% c(38, 48, 58),
node=node,
fill=as.factor(node))) +
labs(fill = "clades for tree in right" )
# Filter for connections between tips
dd = bind_rows(d1, d2) %>%
filter(!is.na(label))
dd = subset(dd, dd$isTip == "TRUE")
pp +
geom_tiplab( bg.colour = alpha('firebrick', .5)) +
geom_tiplab(data = d2, hjust=1,
bg.colour = alpha('firebrick', .5))+
annotate("text", x=0.1, y=33, label = "Gene 1", fontface = "bold")+
annotate("text", x=2.4, y=33, label = "Gene 2", fontface = "bold")
pp + geom_line(aes(x, y, group=label), data=dd, color='grey') +
geom_tiplab( bg.colour = alpha('firebrick', .5)) +
geom_tiplab(data = d2, hjust=1,
bg.colour = alpha('firebrick', .5))+
annotate("text", x=0.1, y=33, label = "Gene 1", fontface = "bold")+
annotate("text", x=2.4, y=33, label = "Gene 2", fontface = "bold")
pp + geom_line(aes(x, y, group=label), data=dd, color='grey') +
geom_tiplab( bg.colour = alpha('firebrick', .5)) +
geom_tiplab(data = d2, hjust=1,
bg.colour = alpha('firebrick', .5))+
annotate("text", x=0.1, y=33, label = "Gene 1", fontface = "bold")+
annotate("text", x=2.4, y=33, label = "Gene 2", fontface = "bold")+
geom_highlight(node = 56, alpha = 0.2, fill="red", extendto = 0.6)+
geom_highlight(node = 47, alpha = 0.2, fill="blue", extendto = 0.75)
pp + geom_line(aes(x, y, group=label), data=dd, color='grey') +
geom_tiplab( bg.colour = alpha('firebrick', .5)) +
geom_tiplab(data = d2, hjust=1,
bg.colour = alpha('firebrick', .5))+
annotate("text", x=0.1, y=33, label = "Gene 1", fontface = "bold")+
annotate("text", x=2.4, y=33, label = "Gene 2", fontface = "bold")+
geom_highlight(node = 56, alpha = 0.2, fill="red", extendto = 0.6)+
geom_highlight(node = 47, alpha = 0.2, fill="blue", extendto = 0.75)+
geom_taxalink(taxa1 = 47, taxa2 = 56, color="darkorange2", curvature=-.9, lwd=0.8, ncp = 10, linetype = "dashed",
arrow = arrow(length = unit(0.02, "npc"),
type="closed",
ends = "both"))
pp + geom_line(aes(x, y, group=label), data=dd, color='grey') +
geom_tiplab( bg.colour = alpha('firebrick', .5)) +
geom_tiplab(data = d2, hjust=1,
bg.colour = alpha('firebrick', .5))+
annotate("text", x=0.1, y=33, label = "Gene 1", fontface = "bold")+
annotate("text", x=2.4, y=33, label = "Gene 2", fontface = "bold")+
geom_highlight(node = 56, alpha = 0.2, fill="red", extendto = 0.6)+
geom_highlight(node = 47, alpha = 0.2, fill="blue", extendto = 0.75)+
geom_taxalink(taxa1 = 47, taxa2 = 56, color="darkorange2", curvature=-.9, lwd=0.8, ncp = 10, linetype = "dashed",
arrow = arrow(length = unit(0.02, "npc"),
type="closed",
ends = "both"))+
geom_taxalink(taxa1 = 37, taxa2 = 41, color="mediumblue", curvature=-.45, lwd=0.8, ncp = 10, offset = -0.04)
library(gggenes) # used only to get the example dataset
head(example_genes)
## molecule gene start end strand orientation
## 1 Genome1 genA 15389 17299 reverse 1
## 2 Genome1 genB 17301 18161 forward 0
## 3 Genome1 genC 18176 18640 reverse 1
## 4 Genome1 genD 18641 18985 forward 0
## 5 Genome1 genE 18999 20078 reverse 1
## 6 Genome1 genF 20086 20451 forward 1
get_genes <- function(data, genome) {filter(data, molecule == genome) %>% pull(gene)}
g <- unique(example_genes[,1])
n <- length(g)
d <- matrix(nrow = n, ncol = n)
rownames(d) <- colnames(d) <- g
genes <- lapply(g, get_genes, data = example_genes)
for (i in 1:n) {
for (j in 1:i) {
jaccard_sim <- length(intersect(genes[[i]], genes[[j]])) /
length(union(genes[[i]], genes[[j]]))
d[j, i] <- d[i, j] <- 1 - jaccard_sim
}
}
tree <- ape::bionj(d)
ggtree(tree, branch.length='none') +
geom_tiplab() + xlim_tree(5.5)
ggtree(tree, branch.length='none') +
geom_tiplab() + xlim_tree(5.5) +
geom_facet(mapping = aes(xmin = start, xmax = end, fill = gene),
data = example_genes, geom = geom_motif, panel = 'Alignment',
on = 'genE', label = 'gene', align = 'left') +
scale_fill_brewer(palette = "Set3") +
scale_x_continuous(expand=c(0,0)) +
theme(strip.text=element_blank(),
panel.spacing=unit(0, 'cm'),
axis.text = element_text(size=15))
| Command | Description | Package | Example |
|---|---|---|---|
| read.tree() | Loads tree file. Creates a phylo object. | ape | tree = read.tree(“RAxML.tree”) |
| ggtree() | Draws phylogenetic tree from phylo object | ggtree | p = ggtree(tree) |
| %<+% | |||
| Command | Description | Package | Example |
|---|---|---|---|
| abbreviate() | Abbreviates strings (e.g. words) to at least minlength (e.g. 4) characters, such that they remain unique. | ||
save.image("workshop_ggtree_session.RData")